This tutorial demonstrates how to integrate transcriptomics data with connectomics data to identify neurons expressing specific genes and analyze their connectivity patterns.
Voltage-gated calcium channels (VGCCs) play crucial roles in neuronal signaling. The Ca-α1T gene encodes a low-voltage-activated (T-type) calcium channel subunit that is particularly important in visual processing circuits.
Key biological insights: - Ca-α1T (FlyBase: FBgn0264386) - T-type calcium channels enable low-threshold spiking and burst firing, inc. calcium spikes in hyperpolarised neurons. - Known to be expressed in the optic lobes, but not in which types originally. - See Ishida et al. (2025) for example in central complex
Tutorial goals: 1. Load transcriptomics data from visual system cell types 2. Identify cell types expressing Ca-α1T 3. Map those cell types to the FAFB connectome 4. Analyze their connectivity patterns 5. Visualise a network of Ca-α1T+ neurons and their strongest synaptic partners
Currently working with:
The expression matrix contains normalized gene expression values for visual system cell types at 96 hours of development.
# Construct path to transcriptomics data
expr_matrix_path <- file.path(data_path, "transcriptomics", "visual_system",
"expr_matrix_96h_V1.1.tsv")
# Read expression matrix
# First column is gene names, remaining columns are cell types
if (use_gcs) {
# Download from GCS
temp_file <- tempfile(fileext = ".tsv")
system(paste("gsutil cp", expr_matrix_path, temp_file))
expr_matrix <- read.delim(temp_file, row.names = 1, check.names = FALSE)
unlink(temp_file)
} else {
expr_matrix <- read.delim(expr_matrix_path, row.names = 1, check.names = FALSE)
}
cat("Expression matrix dimensions:\n")
## Expression matrix dimensions:
cat(" Genes:", nrow(expr_matrix), "\n")
## Genes: 12037
cat(" Cell types:", ncol(expr_matrix), "\n")
## Cell types: 61
cat("\nFirst few genes and cell types:\n")
##
## First few genes and cell types:
expr_matrix[1:5, 1:5]
## C2 C3 Dm1 Dm10 Dm11
## 128up 0.16 0.14 0.52 0.24 0.20
## 14-3-3epsilon 22.63 22.94 14.94 22.29 25.17
## 14-3-3zeta 20.64 24.02 50.41 21.47 35.20
## 140up 0.09 0.15 0.13 0.20 0.15
## 18SrRNA-Psi:CR41602 0.61 0.56 1.22 0.70 0.45
The cell type key maps between different naming conventions across datasets: - Nern_type: maleCNS naming - Matsliah_type / Schlegel_type: FAFB/BANC naming
# Construct path to cell type key
cell_type_key_path <- file.path(data_path, "transcriptomics", "visual_system",
"cell_types_V1.1a.tsv")
# Read cell type key
if (use_gcs) {
temp_file <- tempfile(fileext = ".tsv")
system(paste("gsutil cp", cell_type_key_path, temp_file))
cell_type_key <- read.delim(temp_file, stringsAsFactors = FALSE)
unlink(temp_file)
} else {
cell_type_key <- read.delim(cell_type_key_path, stringsAsFactors = FALSE)
}
cat("Cell type key dimensions:", nrow(cell_type_key), "cell types\n")
## Cell type key dimensions: 62 cell types
cat("\nColumn names:\n")
##
## Column names:
print(names(cell_type_key))
## [1] "class1" "type2" "subtype2" "Nern_type"
## [5] "Matsliah_type" "Schlegel_type"
cat("\nFirst few rows:\n")
##
## First few rows:
head(cell_type_key, 10)
## class1 type2 subtype2 Nern_type Matsliah_type Schlegel_type
## 1 N C2 C2 C2 C2 C2
## 2 N C3 C3 C3 C3 C3
## 3 N Dm1 Dm1 Dm1 Dm1 Dm1
## 4 N Dm2 Dm2 Dm2 Dm2 Dm2
## 5 N Dm3 Dm3a Dm3a Dm3p Dm3
## 6 N Dm3 Dm3b Dm3b Dm3q Dm3
## 7 N Dm4 Dm4 Dm4 Dm4 Dm4
## 8 N Dm8 Dm8 Dm8a,Dm8b yDm8,pDm8 Dm8
## 9 N Dm9 Dm9 Dm9 Dm9 Dm9
## 10 N Dm10 Dm10 Dm10 Dm10 Dm10
Before focusing on Ca-α1T, let’s explore the expression patterns of voltage-gated ion channels (VGICs) across cell types to provide context.
# Search for voltage-gated ion channel genes
# Look for genes containing typical VGIC patterns:
# - Calcium channels: Ca-alpha, Ca-beta, cacophony (cac)
# - Sodium channels: para, NaCP60E
# - Potassium channels: Shaker, Shaw, Shab, Shal, eag, erg, elk
vgic_patterns <- c(
"^Ca-alpha", # Calcium channel alpha subunits
"^Ca-beta", # Calcium channel beta subunits
"^cac", # Cacophony (voltage-gated Ca channel)
"^para", # Paralytic (voltage-gated Na channel)
"^NaCP", # Sodium channel proteins
"^Sh$", # Shaker (Kv1 family)
"^Shaw", # Shaw (Kv3 family)
"^Shab", # Shab (Kv2 family)
"^Shal", # Shal (Kv4 family)
"^eag$", # Ether-a-go-go (Kv10 family)
"^erg$", # Erg (Kv11 family)
"^elk$" # Elk (Kv12 family)
)
# Find matching genes in expression matrix
vgic_genes <- grep(paste(vgic_patterns, collapse = "|"),
rownames(expr_matrix),
value = TRUE,
ignore.case = FALSE)
cat("Found", length(vgic_genes), "voltage-gated ion channel genes:\n")
## Found 14 voltage-gated ion channel genes:
print(vgic_genes)
## [1] "Ca-alpha1D" "Ca-alpha1T" "Ca-beta" "NaCP60E" "Sh"
## [6] "Shab" "Shal" "Shaw" "Shawl" "cac"
## [11] "cact" "cactin" "eag" "para"
# Subset expression matrix to VGIC genes
vgic_expr <- expr_matrix[vgic_genes, , drop = FALSE]
cat("\nVGIC expression matrix:\n")
##
## VGIC expression matrix:
cat(" Genes:", nrow(vgic_expr), "\n")
## Genes: 14
cat(" Cell types:", ncol(vgic_expr), "\n")
## Cell types: 61
cat(" Value range:", min(vgic_expr), "to", max(vgic_expr), "\n")
## Value range: 0 to 134.8
# Show summary statistics per gene
vgic_stats <- data.frame(
gene = rownames(vgic_expr),
mean_expr = rowMeans(vgic_expr),
max_expr = apply(vgic_expr, 1, max),
cv = apply(vgic_expr, 1, sd) / rowMeans(vgic_expr) # Coefficient of variation
) %>%
arrange(desc(mean_expr))
cat("\nTop 10 most highly expressed VGICs:\n")
##
## Top 10 most highly expressed VGICs:
print(head(vgic_stats, 10))
## gene mean_expr max_expr cv
## Sh Sh 31.025410 84.60 0.3743809
## para para 30.002459 134.80 0.6002796
## Ca-alpha1T Ca-alpha1T 9.244262 72.42 1.3780203
## NaCP60E NaCP60E 6.684262 33.70 0.7172604
## cac cac 6.504426 10.80 0.2341405
## Shal Shal 2.964918 7.38 0.5290207
## eag eag 2.679180 11.84 0.8033499
## Shab Shab 2.651148 18.60 1.0590988
## Ca-alpha1D Ca-alpha1D 2.404590 4.87 0.2838044
## Ca-beta Ca-beta 1.682295 7.81 0.5991748
# Prepare data for heatmap
# Log-transform for better visualisation (add pseudocount to avoid log(0))
vgic_expr_log <- log2(vgic_expr + 1)
# Create heatmap using pheatmap
library(pheatmap)
library(viridis)
# Annotate rows by channel type
gene_annotation <- data.frame(
Channel_Type = case_when(
grepl("^Ca-", rownames(vgic_expr_log)) ~ "Calcium",
grepl("^cac", rownames(vgic_expr_log)) ~ "Calcium",
grepl("^para|^NaCP", rownames(vgic_expr_log)) ~ "Sodium",
grepl("^Sh|^Shaw|^Shab|^Shal|^eag|^erg|^elk", rownames(vgic_expr_log)) ~ "Potassium",
TRUE ~ "Other"
),
row.names = rownames(vgic_expr_log)
)
# Define colors for channel types
channel_colors <- list(
Channel_Type = c("Calcium" = "#E41A1C", "Sodium" = "#377EB8",
"Potassium" = "#4DAF4A", "Other" = "grey70")
)
# Create heatmap with viridis color scale
heatmap_plot <- pheatmap(
vgic_expr_log,
color = viridis(100),
cluster_rows = TRUE,
cluster_cols = TRUE,
clustering_distance_rows = "correlation",
clustering_distance_cols = "correlation",
annotation_row = gene_annotation,
annotation_colors = channel_colors,
show_rownames = TRUE,
show_colnames = TRUE,
fontsize_row = 8,
fontsize_col = 8,
main = "Voltage-Gated Ion Channel Expression Across Visual Cell Types",
angle_col = 90,
cellwidth = 15,
cellheight = 12,
border_color = NA
)
# Save high-resolution version
png(file.path(img_dir, "vgic_heatmap.png"),
width = 18, height = 10, units = "in", res = 300)
pheatmap(
vgic_expr_log,
color = viridis(100),
cluster_rows = TRUE,
cluster_cols = TRUE,
clustering_distance_rows = "correlation",
clustering_distance_cols = "correlation",
annotation_row = gene_annotation,
annotation_colors = channel_colors,
show_rownames = TRUE,
show_colnames = TRUE,
fontsize_row = 8,
fontsize_col = 8,
main = "Voltage-Gated Ion Channel Expression Across Visual Cell Types",
angle_col = 90,
cellwidth = 15,
cellheight = 12,
border_color = NA
)
dev.off()
## quartz_off_screen
## 3
cat("\nHeatmap saved to:", file.path(img_dir, "vgic_heatmap.png"), "\n")
##
## Heatmap saved to: images/tutorial_05/vgic_heatmap.png
The heatmap reveals: - Cell-type specific expression patterns: Different VGICs show distinct expression profiles - Calcium channel diversity: Multiple Ca-α subtypes with varying expression - Potassium channel complexity: Diverse K+ channels enable varied firing properties - Co-expression patterns: Clustered genes may indicate functional relationships
Now let’s focus on Ca-α1T, a low-voltage-activated T-type calcium channel particularly important in visual processing.
Let’s find which cell types express the voltage-gated calcium channel Ca-α1T.
# Check if Ca-alpha1T is in the expression matrix
if (!"Ca-alpha1T" %in% rownames(expr_matrix)) {
stop("Ca-alpha1T not found in expression matrix!")
}
# Extract Ca-alpha1T expression across all cell types
ca_alpha1t_expr <- expr_matrix["Ca-alpha1T", , drop = FALSE]
ca_alpha1t_expr <- as.data.frame(t(ca_alpha1t_expr))
colnames(ca_alpha1t_expr) <- "expression"
ca_alpha1t_expr$cell_type <- rownames(ca_alpha1t_expr)
# Sort by expression level
ca_alpha1t_expr <- ca_alpha1t_expr %>%
arrange(desc(expression))
cat("Ca-α1T expression statistics:\n")
## Ca-α1T expression statistics:
cat(" Mean expression:", mean(ca_alpha1t_expr$expression), "\n")
## Mean expression: 9.244262
cat(" Median expression:", median(ca_alpha1t_expr$expression), "\n")
## Median expression: 6.38
cat(" Max expression:", max(ca_alpha1t_expr$expression), "\n")
## Max expression: 72.42
cat(" 90th percentile:", quantile(ca_alpha1t_expr$expression, 0.9), "\n")
## 90th percentile: 19.19
# Show top expressors
cat("\nTop 10 Ca-α1T expressing cell types:\n")
##
## Top 10 Ca-α1T expressing cell types:
print(ca_alpha1t_expr[1:10, ])
## expression cell_type
## Dm9 72.42 Dm9
## Pm4 43.87 Pm4
## Dm3a 43.36 Dm3a
## Dm3b 40.81 Dm3b
## Tm2 26.90 Tm2
## Mi1 21.69 Mi1
## C2 19.19 C2
## L5 17.16 L5
## LLPC3 14.70 LLPC3
## LPLC1 13.38 LPLC1
# Define expression threshold (90th percentile)
expr_threshold <- quantile(ca_alpha1t_expr$expression, 0.9)
# Add category for plotting
ca_alpha1t_expr <- ca_alpha1t_expr %>%
mutate(expressing = expression >= expr_threshold)
# Create bar plot
p_expr <- ggplot(ca_alpha1t_expr, aes(x = reorder(cell_type, expression),
y = expression,
fill = expressing)) +
geom_col() +
geom_hline(yintercept = expr_threshold, linetype = "dashed", color = "red", size = 0.8) +
scale_fill_manual(values = c("FALSE" = "grey70", "TRUE" = "steelblue"),
labels = c("FALSE" = "Below threshold", "TRUE" = "High expression"),
name = "Expression Level") +
labs(
title = "Ca-α1T Expression Across Visual System Cell Types",
subtitle = paste("Threshold at 90th percentile (", round(expr_threshold, 2), ")", sep = ""),
x = "Cell Type",
y = "Expression Level"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 8),
plot.title = element_text(face = "bold", size = 14),
legend.position = "top"
)
save_plot(p_expr, "ca_alpha1t_expression", width = 16, height = 8)
ggplotly(p_expr)
# Identify high expressors
high_expressors <- ca_alpha1t_expr %>%
filter(expressing) %>%
pull(cell_type)
cat("\nCell types with high Ca-α1T expression (≥90th percentile):\n")
##
## Cell types with high Ca-α1T expression (≥90th percentile):
print(high_expressors)
## [1] "Dm9" "Pm4" "Dm3a" "Dm3b" "Tm2" "Mi1" "C2"
# Construct path to FAFB metadata
meta_path <- construct_path(data_path, dataset, "meta")
# Read metadata
meta_full <- read_feather_smart(meta_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/fafb/fafb_783_meta.feather
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 8.82 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 139215 rows
cat("FAFB metadata loaded:\n")
## FAFB metadata loaded:
cat(" Total neurons:", nrow(meta_full), "\n")
## Total neurons: 139215
cat(" Columns:", ncol(meta_full), "\n")
## Columns: 17
# Display first few rows
head(meta_full %>% select(all_of(dataset_id), cell_type, super_class, neurotransmitter_predicted))
## # A tibble: 6 × 4
## fafb_783_id cell_type super_class neurotransmitter_predic…¹
## <chr> <chr> <chr> <chr>
## 1 720575940630915748 ORN_DM6 sensory acetylcholine
## 2 720575940620993718 ORN_VA1v sensory acetylcholine
## 3 720575940603453286 M_adPNm5 central_brain_intrinsic acetylcholine
## 4 720575940630003472 M_adPNm4 central_brain_intrinsic acetylcholine
## 5 720575940614316731 M_adPNm5 central_brain_intrinsic acetylcholine
## 6 720575940629587671 M_adPNm5 central_brain_intrinsic acetylcholine
## # ℹ abbreviated name: ¹neurotransmitter_predicted
The cell type key uses Matsliah_type and Schlegel_type columns for FAFB/BANC naming.
# For FAFB, we'll use Schlegel_type (primary) and Matsliah_type (fallback)
# These map to the cell_type column in FAFB metadata
# Get high expressing cell types from transcriptomics
high_expr_celltypes_transcr <- high_expressors
# Map to FAFB naming using both Schlegel_type and Matsliah_type
fafb_cell_types <- cell_type_key %>%
filter(subtype2 %in% high_expr_celltypes_transcr) %>%
select(Schlegel_type, Matsliah_type) %>%
pivot_longer(cols = everything(), values_to = "fafb_name") %>%
pull(fafb_name) %>%
unique()
cat("Mapped", length(fafb_cell_types), "cell type names to FAFB\n")
## Mapped 9 cell type names to FAFB
cat("\nFAFB cell type names:\n")
##
## FAFB cell type names:
print(fafb_cell_types)
## [1] "C2" "Dm3" "Dm3p" "Dm3q" "Dm9" "Mi1" "Pm4" "Pm05" "Tm2"
# Find neurons of these types in FAFB metadata
ca_alpha1t_neurons <- meta_full %>%
filter(cell_type %in% fafb_cell_types)
cat("\nFound", nrow(ca_alpha1t_neurons), "Ca-α1T+ neurons in FAFB\n")
##
## Found 6815 Ca-α1T+ neurons in FAFB
# Show cell type distribution
if (nrow(ca_alpha1t_neurons) > 0) {
cat("\nCell type distribution:\n")
print(table(ca_alpha1t_neurons$cell_type))
# Get neuron IDs for connectivity analysis
ca_alpha1t_ids <- ca_alpha1t_neurons[[dataset_id]]
cat("\nNumber of Ca-α1T+ neuron IDs:", length(ca_alpha1t_ids), "\n")
} else {
cat("\nWarning: No matching neurons found in FAFB metadata!\n")
cat("This may be due to naming differences or sparse annotation.\n")
cat("Consider using a broader search or different cell type mappings.\n")
}
##
## Cell type distribution:
##
## C2 Dm3p Dm3q Dm9 Mi1 Pm05 Tm2
## 1424 1005 902 203 1582 148 1551
##
## Number of Ca-α1T+ neuron IDs: 6815
# Construct path to FAFB edgelist
edgelist_path <- construct_path(data_path, dataset, "edgelist_simple")
# Read edgelist (this may take a minute for large datasets)
cat("Loading FAFB edgelist (this may take 1-2 minutes)...\n")
## Loading FAFB edgelist (this may take 1-2 minutes)...
edgelist_full <- read_feather_smart(edgelist_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/fafb/fafb_783_simple_edgelist.feather
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 288.61 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 15023799 rows
cat("\nEdgelist loaded:\n")
##
## Edgelist loaded:
cat(" Total connections:", nrow(edgelist_full), "\n")
## Total connections: 15023799
cat(" Columns:", paste(names(edgelist_full), collapse = ", "), "\n")
## Columns: pre, post, count, norm, total_input
# Display structure
head(edgelist_full)
## # A tibble: 6 × 5
## pre post count norm total_input
## <chr> <chr> <int> <dbl> <int>
## 1 720575940630598927 720575940596125868 13 0.0681 191
## 2 720575940638698842 720575940596125868 11 0.0576 191
## 3 720575940616595515 720575940596125868 9 0.0471 191
## 4 720575940625903678 720575940596125868 8 0.0419 191
## 5 720575940644945608 720575940596125868 12 0.0628 191
## 6 720575940621780812 720575940596125868 2 0.0105 191
We’ll identify the strongest synaptic partners of Ca-α1T+ neurons using two criteria: 1. Normalised weight ≥ 90th percentile (strong relative connectivity) 2. Synapse count > 5 (minimum absolute connectivity)
if (length(ca_alpha1t_ids) > 0) {
# Filter edgelist for connections involving Ca-α1T+ neurons
# Include both pre and post connections
edgelist_ca <- edgelist_full %>%
filter(pre %in% ca_alpha1t_ids | post %in% ca_alpha1t_ids)
cat("Connections involving Ca-α1T+ neurons:", nrow(edgelist_ca), "\n")
# Calculate 90th percentile of normalised weight
norm_threshold <- quantile(edgelist_ca$norm, 0.9, na.rm = TRUE)
cat("90th percentile of norm:", norm_threshold, "\n")
# Filter for strong connections (norm ≥ 90th percentile AND count > 5)
edgelist_strong <- edgelist_ca %>%
filter(norm >= norm_threshold, count > 5)
cat("\nStrong connections (≥90th percentile norm AND count>5):", nrow(edgelist_strong), "\n")
# Get all neuron IDs in the strong network (Ca-α1T+ neurons + partners)
network_ids <- unique(c(edgelist_strong$pre, edgelist_strong$post))
cat("Total neurons in network:", length(network_ids), "\n")
# Get metadata for network neurons
network_meta <- meta_full %>%
filter(!!sym(dataset_id) %in% network_ids) %>%
mutate(is_ca_alpha1t = !!sym(dataset_id) %in% ca_alpha1t_ids)
cat(" Ca-α1T+ neurons:", sum(network_meta$is_ca_alpha1t), "\n")
cat(" Partner neurons:", sum(!network_meta$is_ca_alpha1t), "\n")
} else {
cat("No Ca-α1T+ neurons found - skipping connectivity analysis\n")
}
## Connections involving Ca-α1T+ neurons: 885607
## 90th percentile of norm: 0.0411
##
## Strong connections (≥90th percentile norm AND count>5): 77905
## Total neurons in network: 44236
## Ca-α1T+ neurons: 6775
## Partner neurons: 37456
if (exists("edgelist_strong") && nrow(edgelist_strong) > 0) {
# Compute network statistics
cat("\nNetwork Statistics (Neuron-Level):\n")
cat("=" , rep("=", 50), "\n", sep = "")
# Connection counts
cat("Total connections:", nrow(edgelist_strong), "\n")
cat("Average synapse count:", mean(edgelist_strong$count), "\n")
cat("Average normalised weight:", mean(edgelist_strong$norm), "\n")
# Partner statistics
ca_outputs <- edgelist_strong %>%
filter(pre %in% ca_alpha1t_ids) %>%
count(pre) %>%
nrow()
ca_inputs <- edgelist_strong %>%
filter(post %in% ca_alpha1t_ids) %>%
count(post) %>%
nrow()
cat("\nCa-α1T+ neurons with outputs:", ca_outputs, "\n")
cat("Ca-α1T+ neurons with inputs:", ca_inputs, "\n")
# Top partners
cat("\nTop 10 strongest postsynaptic partners (neuron-level):\n")
top_post <- edgelist_strong %>%
filter(pre %in% ca_alpha1t_ids) %>%
group_by(post) %>%
summarise(
total_count = sum(count),
mean_norm = mean(norm),
n_connections = n()
) %>%
arrange(desc(mean_norm)) %>%
head(10) %>%
left_join(network_meta %>% select(!!sym(dataset_id), cell_type, super_class),
by = setNames(dataset_id, "post"))
print(top_post)
cat("\nTop 10 strongest presynaptic partners (neuron-level):\n")
top_pre <- edgelist_strong %>%
filter(post %in% ca_alpha1t_ids) %>%
group_by(pre) %>%
summarise(
total_count = sum(count),
mean_norm = mean(norm),
n_connections = n()
) %>%
arrange(desc(mean_norm)) %>%
head(10) %>%
left_join(network_meta %>% select(!!sym(dataset_id), cell_type, super_class),
by = setNames(dataset_id, "pre"))
print(top_pre)
# Cell type-level statistics
cat("\n\n")
cat("\nNetwork Statistics (Cell Type-Level):\n")
cat("=" , rep("=", 50), "\n", sep = "")
# Add cell type metadata to edgelist
edgelist_with_types <- edgelist_strong %>%
left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t),
by = setNames(dataset_id, "pre")) %>%
rename(pre_celltype = cell_type, pre_is_ca = is_ca_alpha1t) %>%
left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t),
by = setNames(dataset_id, "post")) %>%
rename(post_celltype = cell_type, post_is_ca = is_ca_alpha1t)
# Collapse to cell type level
celltype_stats <- edgelist_with_types %>%
group_by(pre_celltype, post_celltype, pre_is_ca, post_is_ca) %>%
summarise(
mean_norm = mean(norm, na.rm = TRUE),
total_count = sum(count, na.rm = TRUE),
n_neuron_connections = n(),
.groups = "drop"
) %>%
filter(!is.na(mean_norm))
cat("Total cell type connections:", nrow(celltype_stats), "\n")
cat("Average synapse count per cell type pair:", mean(celltype_stats$total_count), "\n")
cat("Average normalised weight per cell type pair:", mean(celltype_stats$mean_norm), "\n")
cat("Average neuron connections per cell type pair:", mean(celltype_stats$n_neuron_connections), "\n")
# Cell type partner statistics
ca_celltype_outputs <- celltype_stats %>%
filter(pre_is_ca) %>%
distinct(pre_celltype) %>%
nrow()
ca_celltype_inputs <- celltype_stats %>%
filter(post_is_ca) %>%
distinct(post_celltype) %>%
nrow()
cat("\nCa-α1T+ cell types with outputs:", ca_celltype_outputs, "\n")
cat("Ca-α1T+ cell types with inputs:", ca_celltype_inputs, "\n")
# Top cell type partners
cat("\nTop 10 strongest postsynaptic partner cell types:\n")
top_post_celltype <- celltype_stats %>%
filter(pre_is_ca) %>%
arrange(desc(mean_norm)) %>%
head(10) %>%
select(post_celltype, mean_norm, total_count, n_neuron_connections)
print(top_post_celltype)
cat("\nTop 10 strongest presynaptic partner cell types:\n")
top_pre_celltype <- celltype_stats %>%
filter(post_is_ca) %>%
arrange(desc(mean_norm)) %>%
head(10) %>%
select(pre_celltype, mean_norm, total_count, n_neuron_connections)
print(top_pre_celltype)
}
##
## Network Statistics (Neuron-Level):
## ===================================================
## Total connections: 77905
## Average synapse count: 22.37877
## Average normalised weight: 0.1097034
##
## Ca-α1T+ neurons with outputs: 5322
## Ca-α1T+ neurons with inputs: 6627
##
## Top 10 strongest postsynaptic partners (neuron-level):
## # A tibble: 10 × 6
## post total_count mean_norm n_connections cell_type super_class
## <chr> <int> <dbl> <int> <chr> <chr>
## 1 720575940608048818 9 1 1 R7 sensory
## 2 720575940608368050 9 1 1 R7 sensory
## 3 720575940613188778 19 1 1 R7 sensory
## 4 720575940613194226 23 1 1 R8 sensory
## 5 720575940614720040 6 1 1 R8 sensory
## 6 720575940615021352 14 1 1 R8 sensory
## 7 720575940615969898 10 1 1 R8 sensory
## 8 720575940616743436 34 1 1 R7 sensory
## 9 720575940620682320 24 1 1 R7 sensory
## 10 720575940621179329 19 1 1 R7 sensory
##
## Top 10 strongest presynaptic partners (neuron-level):
## # A tibble: 10 × 6
## pre total_count mean_norm n_connections cell_type super_class
## <chr> <int> <dbl> <int> <chr> <chr>
## 1 720575940636953445 9 0.6 1 L1 optic_lobe_…
## 2 720575940625365577 130 0.592 2 L1 optic_lobe_…
## 3 720575940612744085 36 0.581 1 L2 optic_lobe_…
## 4 720575940624865463 29 0.547 1 L1 optic_lobe_…
## 5 720575940615608482 85 0.532 2 L1 optic_lobe_…
## 6 720575940605418668 287 0.528 1 L2 optic_lobe_…
## 7 720575940626763210 185 0.520 1 L2 optic_lobe_…
## 8 720575940617341652 97 0.519 1 L2 optic_lobe_…
## 9 720575940623312391 177 0.509 1 L2 optic_lobe_…
## 10 720575940638509631 54 0.505 1 L2 optic_lobe_…
##
##
##
## Network Statistics (Cell Type-Level):
## ===================================================
## Total cell type connections: 395
## Average synapse count per cell type pair: 4413.716
## Average normalised weight per cell type pair: 0.08035699
## Average neuron connections per cell type pair: 197.2278
##
## Ca-α1T+ cell types with outputs: 7
## Ca-α1T+ cell types with inputs: 7
##
## Top 10 strongest postsynaptic partner cell types:
## # A tibble: 10 × 4
## post_celltype mean_norm total_count n_neuron_connections
## <chr> <dbl> <int> <int>
## 1 R7 0.670 17354 784
## 2 R8 0.638 14563 759
## 3 L5 0.615 32 1
## 4 L3 0.496 54 5
## 5 R1-6 0.479 23 3
## 6 L4 0.458 22 1
## 7 <NA> 0.456 20 2
## 8 R7 0.414 22 3
## 9 R8 0.339 26 3
## 10 R8 0.225 97 14
##
## Top 10 strongest presynaptic partner cell types:
## # A tibble: 10 × 4
## pre_celltype mean_norm total_count n_neuron_connections
## <chr> <dbl> <int> <int>
## 1 L2 0.334 195640 1615
## 2 L1 0.299 49716 1404
## 3 L1 0.271 175754 1586
## 4 <NA> 0.269 85 3
## 5 <NA> 0.245 334 3
## 6 R1-6 0.170 8 1
## 7 Tm2 0.143 13 1
## 8 L2 0.136 636 16
## 9 T4a 0.136 15 2
## 10 T1 0.133 8 1
if (exists("edgelist_strong") && nrow(edgelist_strong) > 0) {
# Filter for only UPSTREAM connections to Ca-α1T+ neurons
# (i.e., post must be Ca-α1T+)
edgelist_upstream <- edgelist_strong %>%
filter(post %in% ca_alpha1t_ids)
cat("Filtering for upstream connections to Ca-α1T+ neurons:\n")
cat(" Total strong connections:", nrow(edgelist_strong), "\n")
cat(" Upstream to Ca-α1T+:", nrow(edgelist_upstream), "\n")
# Add cell type and neurotransmitter metadata to edgelist
edgelist_with_types <- edgelist_upstream %>%
left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t, neurotransmitter_predicted),
by = setNames(dataset_id, "pre")) %>%
rename(pre_celltype = cell_type, pre_is_ca = is_ca_alpha1t, pre_nt = neurotransmitter_predicted) %>%
left_join(network_meta %>% select(!!sym(dataset_id), cell_type, is_ca_alpha1t),
by = setNames(dataset_id, "post")) %>%
rename(post_celltype = cell_type, post_is_ca = is_ca_alpha1t)
# Filter for only presynaptic neurons with GABA or glutamate
edgelist_inhibitory <- edgelist_with_types %>%
filter(tolower(pre_nt) %in% c("gaba", "glutamate"))
cat(" After filtering for GABA/glutamate presynaptic:", nrow(edgelist_inhibitory), "\n")
# Count neurons per cell type (both pre and post)
neurons_per_celltype <- network_meta %>%
group_by(cell_type) %>%
summarise(n_neurons = n(), .groups = "drop")
# Replace NA cell types with "unknown"
edgelist_inhibitory <- edgelist_inhibitory %>%
mutate(
pre_celltype = ifelse(is.na(pre_celltype), "unknown", pre_celltype),
post_celltype = ifelse(is.na(post_celltype), "unknown", post_celltype),
pre_is_ca = ifelse(is.na(pre_is_ca), FALSE, pre_is_ca),
post_is_ca = ifelse(is.na(post_is_ca), FALSE, post_is_ca),
pre_nt = ifelse(is.na(pre_nt), "unknown", pre_nt)
)
# Collapse to cell type level
# Calculate mean norm and total count for each cell type pair
celltype_edgelist <- edgelist_inhibitory %>%
group_by(pre_celltype, post_celltype, pre_is_ca, post_is_ca) %>%
summarise(
mean_norm = mean(norm, na.rm = TRUE),
total_count = sum(count, na.rm = TRUE),
n_neuron_connections = n(),
.groups = "drop"
) %>%
filter(!is.na(mean_norm)) # Remove any NA weights
cat("\nCell type-level network:\n")
cat(" Collapsed to cell type-cell type connections:", nrow(celltype_edgelist), "\n")
# Create cell type vertex data
# Get unique cell types from both pre and post
all_celltypes <- unique(c(celltype_edgelist$pre_celltype, celltype_edgelist$post_celltype))
# Determine if each cell type contains Ca-α1T+ neurons and count neurons
celltype_vertices <- data.frame(
cell_type = all_celltypes,
stringsAsFactors = FALSE
) %>%
left_join(
edgelist_inhibitory %>%
group_by(post_celltype) %>%
summarise(has_ca_alpha1t = any(post_is_ca), .groups = "drop"),
by = c("cell_type" = "post_celltype")
) %>%
left_join(neurons_per_celltype, by = "cell_type") %>%
mutate(
has_ca_alpha1t = ifelse(is.na(has_ca_alpha1t), FALSE, has_ca_alpha1t),
n_neurons = ifelse(is.na(n_neurons), 1, n_neurons) # Default to 1 if unknown
)
# Create igraph object at cell type level
g_celltype <- graph_from_data_frame(
d = celltype_edgelist %>% select(pre_celltype, post_celltype, mean_norm, total_count, n_neuron_connections),
directed = TRUE,
vertices = celltype_vertices
)
cat("\nCell type network graph created:\n")
cat(" Cell types (vertices):", vcount(g_celltype), "\n")
cat(" Cell type connections (edges):", ecount(g_celltype), "\n")
cat(" Upstream inhibitory cell types:", sum(!V(g_celltype)$has_ca_alpha1t), "\n")
cat(" Ca-α1T+ target cell types:", sum(V(g_celltype)$has_ca_alpha1t), "\n")
# Calculate node degrees
V(g_celltype)$in_degree <- degree(g_celltype, mode = "in")
V(g_celltype)$out_degree <- degree(g_celltype, mode = "out")
V(g_celltype)$total_degree <- degree(g_celltype, mode = "all")
cat("\nDegree statistics:\n")
cat(" Mean in-degree:", round(mean(V(g_celltype)$in_degree), 2), "\n")
cat(" Mean out-degree:", round(mean(V(g_celltype)$out_degree), 2), "\n")
# Store for plotting
g <- g_celltype
} else {
cat("No network data available for visualisation\n")
}
## Filtering for upstream connections to Ca-α1T+ neurons:
## Total strong connections: 77905
## Upstream to Ca-α1T+: 24140
## After filtering for GABA/glutamate presynaptic: 9110
##
## Cell type-level network:
## Collapsed to cell type-cell type connections: 120
##
## Cell type network graph created:
## Cell types (vertices): 69
## Cell type connections (edges): 120
## Upstream inhibitory cell types: 62
## Ca-α1T+ target cell types: 7
##
## Degree statistics:
## Mean in-degree: 1.74
## Mean out-degree: 1.74
if (exists("g") && vcount(g) > 0) {
# Set node colors based on Ca-α1T expression
V(g)$color <- ifelse(V(g)$has_ca_alpha1t, "#E41A1C", "#377EB8")
V(g)$color_label <- ifelse(V(g)$has_ca_alpha1t, "Ca-α1T+ cell type", "GABA/Glu input")
# Set node sizes based on NUMBER OF NEURONS in the cell type
# Use sqrt to scale area proportionally to neuron count
V(g)$size <- sqrt(V(g)$n_neurons) * 2
# Set edge widths based on LOG of mean normalised weight
# Add small constant to avoid log(0)
E(g)$width <- log10(E(g)$mean_norm + 0.001) - min(log10(E(g)$mean_norm + 0.001))
E(g)$width <- (E(g)$width / max(E(g)$width)) * 8 + 1 # Scale to 1-9 range
# Set edge colors based on mean normalised weight using viridis
edge_colors <- viridis(100)
norm_scaled <- cut(E(g)$mean_norm, breaks = 100, labels = FALSE)
E(g)$color <- edge_colors[norm_scaled]
# Use Fruchterman-Reingold layout for cell type networks
set.seed(42)
layout_fr <- layout_with_fr(g)
# Plot with base R graphics
par(mar = c(2, 2, 3, 2), bg = "white")
plot(g,
layout = layout_fr,
vertex.color = V(g)$color,
vertex.size = 3,#V(g)$size,
vertex.label = V(g)$name, # Show all cell type names
vertex.label.cex = 0.7,
vertex.label.color = "black",
vertex.label.family = "sans",
vertex.frame.color = "white",
edge.arrow.size = 0.5,
edge.arrow.width = 1,
edge.width = E(g)$width,
edge.color = E(g)$color,
edge.curved = 0.2,
main = "Inhibitory/Glutamatergic Inputs to Ca-α1T+ Cell Types (FAFB)",
sub = paste("Cell type network:", vcount(g), "cell types,", ecount(g),
"connections | Node size = # neurons | Edge width = log(mean norm)")
)
# Add legend
legend("topleft",
legend = c("Ca-α1T+ cell type", "GABA/Glu input",
"Node size = # neurons", "Edge width = log(mean norm)"),
col = c("#E41A1C", "#377EB8", NA, NA),
pch = c(19, 19, NA, NA),
pt.cex = 2,
bty = "n",
cex = 1.1)
# Save as PNG
png(file.path(img_dir, "ca_alpha1t_inhibitory_network.png"),
width = 14, height = 14, units = "in", res = 300)
par(mar = c(2, 2, 3, 2), bg = "white")
plot(g,
layout = layout_fr,
vertex.color = V(g)$color,
vertex.size = 3,#V(g)$size,
vertex.label = V(g)$name, # Show all cell type names
vertex.label.cex = 0.7,
vertex.label.color = "black",
vertex.label.family = "sans",
vertex.frame.color = "white",
edge.arrow.size = 0.5,
edge.arrow.width = 1,
edge.width = E(g)$width,
edge.color = E(g)$color,
edge.curved = 0.2,
main = "Inhibitory/Glutamatergic Inputs to Ca-α1T+ Cell Types (FAFB)",
sub = paste("Cell type network:", vcount(g), "cell types,", ecount(g),
"connections | Node size = # neurons | Edge width = log(mean norm)")
)
legend("topleft",
legend = c("Ca-α1T+ cell type", "GABA/Glu input",
"Node size = # neurons", "Edge width = log(mean norm)"),
col = c("#E41A1C", "#377EB8", NA, NA),
pch = c(19, 19, NA, NA),
pt.cex = 2,
bty = "n",
cex = 1.1)
dev.off()
cat("\nInhibitory network plot saved to:", file.path(img_dir, "ca_alpha1t_inhibitory_network.png"), "\n")
}
##
## Inhibitory network plot saved to: images/tutorial_05/ca_alpha1t_inhibitory_network.png
if (exists("network_meta") && nrow(network_meta) > 0) {
# Count cell types in network, separated by Ca-α1T expression
celltype_counts <- network_meta %>%
filter(!is.na(cell_type)) %>%
count(cell_type, is_ca_alpha1t) %>%
arrange(desc(n))
# Plot cell type composition
p_celltypes <- ggplot(celltype_counts,
aes(x = reorder(cell_type, n), y = n, fill = is_ca_alpha1t)) +
geom_col() +
scale_fill_manual(
values = c("FALSE" = "#377EB8", "TRUE" = "#E41A1C"),
labels = c("FALSE" = "Partner", "TRUE" = "Ca-α1T+"),
name = "Neuron Type"
) +
coord_flip() +
labs(
title = "Cell Types in Ca-α1T+ Network",
subtitle = paste("Cell types of", sum(celltype_counts$n), "neurons in the strong connectivity network"),
x = "Cell Type",
y = "Number of Neurons"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
axis.text = element_text(size = 10),
legend.position = "top"
)
save_plot(p_celltypes, "ca_alpha1t_network_celltypes", width = 10, height = 8)
ggplotly(p_celltypes)
}
In this tutorial, we:
if (exists("network_meta") && exists("edgelist_strong")) {
cat("\n=== SUMMARY STATISTICS ===\n\n")
cat("Transcriptomics:\n")
cat(" Cell types with high Ca-α1T (≥90th percentile):", length(high_expressors), "\n")
cat("\nConnectome mapping:\n")
cat(" Ca-α1T+ neurons found in FAFB:", length(ca_alpha1t_ids), "\n")
cat("\nConnectivity network:\n")
cat(" Total neurons in network:", vcount(g), "\n")
cat(" Total strong connections:", ecount(g), "\n")
cat(" Network density:", round(edge_density(g), 4), "\n")
cat("\nBiological interpretation:\n")
cat(" Ca-α1T is expressed in specific visual system cell types\n")
cat(" These neurons form a highly connected subnetwork\n")
cat(" T-type calcium channels likely enable graded signaling in these circuits\n")
}
##
## === SUMMARY STATISTICS ===
##
## Transcriptomics:
## Cell types with high Ca-α1T (≥90th percentile): 7
##
## Connectome mapping:
## Ca-α1T+ neurons found in FAFB: 6815
##
## Connectivity network:
## Total neurons in network: 69
## Total strong connections: 120
## Network density: 0.0256
##
## Biological interpretation:
## Ca-α1T is expressed in specific visual system cell types
## These neurons form a highly connected subnetwork
## T-type calcium channels likely enable graded signaling in these circuits
Possible extensions of this analysis: - Compare Ca-α1T+ networks across different datasets (BANC, maleCNS) - Analyse other ion channel genes (e.g., Ca-α1D, other VGCCs) - Investigate functional implications using neurotransmitter predictions - Examine morphological properties of Ca-α1T+ neurons - Compare expression patterns across developmental stages
Tutorial complete! 🎉